Skip to content

GridSample: harden float->int64 casts against NaN/Inf/out-of-range coords#28302

Merged
yuslepukhin merged 6 commits intomainfrom
fix/IntegerOverFlow-IcM31000000575970
May 2, 2026
Merged

GridSample: harden float->int64 casts against NaN/Inf/out-of-range coords#28302
yuslepukhin merged 6 commits intomainfrom
fix/IntegerOverFlow-IcM31000000575970

Conversation

@GopalakrishnanN
Copy link
Copy Markdown
Contributor

Description

Hardens the CPU GridSample operator against undefined behavior caused by static_cast<int64_t> of NaN, ±Inf, or out-of-range floating-point coordinates. Tracks IcM 31000000575970.

Motivation and Context

GridSample previously fed denormalized grid coordinates directly into integer casts (std::floor, std::nearbyint) inside its 2D and 3D Compute paths, the bilinear fast path, and GsReflect. With pathological grid values (NaN, ±Inf, magnitudes beyond INT64_MAX), the casts invoke C++ undefined behavior. On some toolchains/sanitizers this corrupts indices, leading to out-of-bounds reads (a security concern) and on most platforms it just silently returns garbage indices that PixelAtGrid later clamps.

Changes

onnxruntime/core/providers/cpu/tensor/grid_sample.cc

  • GsReflect: explicit guard that returns x_min when the input is non-finite or range <= 0 (covers align_corners=true with a 1×1 image where reflection range collapses to zero).
  • New helper IsSafeForInt64Conversion<T>(v): rejects NaN/Inf and any magnitude > 2⁶².
  • 2D Compute loop: sanitizes unsafe x/y to x_min/y_min before all static_cast<int64_t> calls (Nearest, Linear, Cubic).
  • 3D Compute loop: same sanitization for x/y/z.
  • PrecomputeBilinearSamplePlan2D (bilinear fast path): substitutes -0.5 for unsafe coords; the existing mask logic correctly rejects the resulting out-of-range neighbors.
  • Adds <algorithm> and <cmath> includes.

onnxruntime/test/providers/cpu/tensor/grid_sample_test_custom.cc

  • New standalone TU (renamed from a previously included .inc) with MIT copyright header.
  • Adds 7 regression tests across <float, MLFloat16>:
    • NaN coordinates (nearest + reflection)
    • ±Inf coordinates (bilinear + reflection)
    • Extreme finite coordinates (bilinear + reflection)
    • Cubic + reflection with extreme coordinates (float-only by ONNX type constraint)
    • 3D / 5-D nearest + reflection with extreme coordinates
    • Bilinear + border with extreme coordinates
    • 1×1 image + align_corners=1 (zero reflection range)

cmake/onnxruntime_unittests.cmake

  • Adds the new .cc test to the Emscripten exclusion list alongside the existing grid_sample_test.cc.

Validation

  • Local Windows Release build: cmake --build . --target onnxruntime_provider_test.
  • All *GridSample* tests pass: 116 (96 generated + 8 contrib op + 12 custom across float/MLFloat16).
  • Other unrelated test failures observed in the full provider test run are pre-existing and unrelated to GridSample.

Related

Supersedes the work in #27975 (left open). Filed via origin remote per workflow guidance.

Gopalakrishnan Nallasamy added 3 commits April 30, 2026 14:07
…inates

Extreme grid coordinates could produce out-of-range indices in the reflection branch of PixelAtGrid. Two minimal changes:

1. In GsReflect, widen the int n = static_cast<int>(dx / range) computation to int64_t. For large (but finite) inputs, dx / range can exceed INT_MAX, making the narrow int cast undefined behavior.

2. In PixelAtGrid / PixelAtGrid3D, clamp the int64_t index returned by GsReflect back into [0, W-1] / [0, H-1] / [0, D-1] before indexing the image buffer, guaranteeing the final load stays in bounds.

Adds a regression test covering reflection padding with large-magnitude grid coordinates.
… CPU

Addresses yuslepukhin's review comments on PR #27975:

- GsReflect: add explicit guard for non-finite input or non-positive range
  (returns x_min) before the existing reduction logic so the int64 cast
  inside GsReflect is never invoked on NaN/Inf or with range == 0.
- Introduce IsSafeForInt64Conversion<T>() helper and sanitize unsafe
  coordinates to the in-range border value (x_min/y_min/z_min) before
  the float->int64 conversions in:
    * GridSample<T>::Compute 2D loop (Nearest/Linear/Cubic),
    * GridSample<T>::Compute 3D loop,
    * PrecomputeBilinearSamplePlan2D fast path (substitutes -0.5 so the
      mask logic continues to reject the out-of-range neighbors).
  This eliminates undefined behavior from static_cast<int64_t> on values
  outside [INT64_MIN, INT64_MAX] or NaN/Inf.
- Add <algorithm> and <cmath> includes.
- Expand custom tests to cover NaN, +/-Inf, extreme finite coords across
  bilinear/cubic + reflection, 3D nearest + reflection, bilinear + border,
  and dim=1 with align_corners=1 (range == 0 in GsReflect).
Promote the custom GridSample tests from a header-style .inc included
into grid_sample_test.cc to a standalone translation unit:
- Add MIT copyright header, includes, and namespace wrapping.
- Move GetExecutionProviders/RunTests helpers into an anonymous
  namespace inside the new file (separate internal-linkage copies, so
  there is no ODR conflict with the versions in grid_sample_test.cc).
- Remove the #include from grid_sample_test.cc.
- Add the new file to the Emscripten exclusion list in
  cmake/onnxruntime_unittests.cmake alongside grid_sample_test.cc.
The pathological-coordinate regression tests (NaN/Inf/extreme grid values) target CPU-side hardening. Running them against non-CPU EPs (CUDA/CoreML/iOS/WebGPU) introduces unrelated backend behavior and broad provider-test failure cascades. Restrict those tests to the CPU EP via a new RunCpuOnly helper.

The cubic test is also converted to a non-typed TEST over float only, since ONNX GridSample cubic mode constrains T1 to float (the MLFloat16 instantiation was a no-op).

The two boundary-condition tests (linear_zeros_mixed_bounds_*) continue to run on all available EPs as cross-EP correctness checks.
@yuslepukhin
Copy link
Copy Markdown
Member

Gaps in test coverage:

  • zeros padding + extreme/NaN/Inf coordinates: Not tested. While zeros mode is safe by design, a regression test would confirm the IsSafeForInt64Conversion sanitization doesn't break it.
  • cubic + NaN/Inf: Only extreme-finite is tested for cubic. NaN and Inf paths through cubic interpolation are not covered.
  • 3D (5D) + linear mode + extreme coordinates: Only nearest mode is tested for 3D. The 3D trilinear interpolation path with extreme coords is not covered.
  • 3D + NaN/Inf coordinates: Not tested at all for the 3D path.
  • align_corners=1 with non-degenerate images + extreme coordinates: Only the degenerate 1×1 case is tested with align_corners=1.
  • Negative extreme coordinates asymmetry: Most tests use symmetric positive/negative extremes. A test with only-negative extreme coordinates in a non-constant image would verify correct reflection direction.
  • Mixed normal + pathological coordinates in the same grid: All pathological tests use constant-valued images (all 1.0), making the expected output trivially deterministic. A test with a non-constant image and a mix of normal + pathological grid points would validate that normal sampling is unaffected by the sanitization.

@yuslepukhin
Copy link
Copy Markdown
Member

CUDA is not fixed.

The CUDA implementation in grid_sample_impl.cu has the exact same vulnerability (this is a contrib op under onnxruntime::contrib::cuda):

Additionally:

  • No NaN/Inf guard in GsReflect
  • No safety clamp in PixelAtGrid after the reflection branch
  • No IsSafeForInt64Conversion check before static_cast<int64_t> calls in the compute kernels
    The CUDA implementation needs an equivalent fix. While GPU UB is less dangerous than CPU UB (no ASAN-detectable heap corruption), it still produces garbage output silently and can crash the GPU context.

WebGPU/JS implementations are not vulnerable — WGSL integer conversions have defined saturation semantics.

File a follow-up for CUDA.

Comment thread cmake/onnxruntime_unittests.cmake
Add 7 new CPU-only regression tests for the float->int64 cast hardening:

- linear+zeros+extreme/NaN/Inf: confirms IsSafeForInt64Conversion does not break the bilinear zeros-padding fast path.

- cubic+NaN/Inf: covers NaN/+Inf/-Inf paths through the cubic kernel (float-only).

- 5D linear+extreme: 3D trilinear path with extreme finite coordinates.

- 5D nearest+NaN/Inf: 3D path with non-finite values along different axes.

- align_corners=1 non-degenerate (3x3) image with extreme/NaN coords: verifies sanitization to x_min=0 maps to pixel(0,0).

- negative-only extreme on non-constant image: verifies sanitization is direction-agnostic and rounds to pixel(0,0).

- mixed normal+pathological grid in one call on a non-constant image: verifies normal points sample correctly and only unsafe coords are sanitized.
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Comment thread onnxruntime/test/providers/cpu/tensor/grid_sample_test_custom.cc Outdated
Comment thread onnxruntime/test/providers/cpu/tensor/grid_sample_test_custom.cc Fixed
@GopalakrishnanN
Copy link
Copy Markdown
Contributor Author

Work item to cover the same vulnerability in CUDA - https://aiinfra.visualstudio.com/ONNX%20Runtime/_workitems/edit/93864

Copy link
Copy Markdown
Member

@yuslepukhin yuslepukhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants